66 research outputs found
Temporal coordination under uncertainty: initial results for the two agents case
We focus on the problem of decentralized planning and coordination for two
heterogeneous autonomous agents, having a common mission in an uncertain
environment. For example, we consider a helicopter UAV and a ground rover
cooperating in the exploration of a dangerous zone where communication is
limited, which forces decentralization of planning. After proposing a framework for decentralized planning, we underline the need for a planner under uncertainty taking continuous time into account in time-dependent
problems and present initial results on temporal planning under uncertainty
Towards a hybrid approach for intra-daily recourse strategies
This paper highlights continued work on the question of combining Supervised Learning (SL) and Mixed Integer Programming (MIP) in order to solve intra-daily recourse strategies computation problems, in the field of energy management. Our goal is twofold. On the one hand we wish to share with the research community which are the hot open problems associated with the hybrid method developed in [3]. On the other hand, we highlight and analyze, and introduce solution methods for three key methodological bottlenecks related to the questionof predicting which power units are the most important ones for these recourse strategies, in order to make the optimization process compatible with operational constraints
Disentanglement by Cyclic Reconstruction
Deep neural networks have demonstrated their ability to automatically extract
meaningful features from data. However, in supervised learning, information
specific to the dataset used for training, but irrelevant to the task at hand,
may remain encoded in the extracted representations. This remaining information
introduces a domain-specific bias, weakening the generalization performance. In
this work, we propose splitting the information into a task-related
representation and its complementary context representation. We propose an
original method, combining adversarial feature predictors and cyclic
reconstruction, to disentangle these two representations in the single-domain
supervised case. We then adapt this method to the unsupervised domain
adaptation problem, consisting of training a model capable of performing on
both a source and a target domain. In particular, our method promotes
disentanglement in the target domain, despite the absence of training labels.
This enables the isolation of task-specific information from both domains and a
projection into a common representation. The task-specific representation
allows efficient transfer of knowledge acquired from the source domain to the
target domain. In the single-domain case, we demonstrate the quality of our
representations on information retrieval tasks and the generalization benefits
induced by sharpened task-specific representations. We then validate the
proposed method on several classical domain adaptation benchmarks and
illustrate the benefits of disentanglement for domain adaptation
Temporal Markov Decision Problems : Formalization and Resolution
This thesis addresses the question of planning under uncertainty within a time-dependent changing environment. Original motivation for this work came from the problem of building an autonomous agent able to coordinate with its
uncertain environment; this environment being composed of other agents communicating their intentions or non-controllable processes for which some discrete-event model is available. We investigate several approaches for modeling continuous time-dependency in the framework of Markov Decision Processes (MDPs), leading us to a definition of Temporal Markov Decision Problems. Then our approach focuses on two separate paradigms. First, we investigate time-dependent problems as \emph{implicit-event} processes and describe them through the formalism of Time-dependent MDPs (TMDPs). We extend the existing results concerning optimality equations and present a new Value Iteration algorithm based on piecewise polynomial function representations in order to solve a more general class of TMDPs. This paves the way to a more general discussion on parametric actions in hybrid state and action spaces MDPs with continuous time. In a second time, we investigate the
option of separately modeling the concurrent contributions of exogenous events. This approach of \emph{explicit-event} modeling leads to the use of Generalized Semi-Markov Decision Processes (GSMDP). We establish a link between the general framework of Discrete Events Systems Specification (DEVS) and the formalism of GSMDP, allowing us to build sound discrete-event compatible simulators. Then we introduce a simulation-based Policy Iteration approach for
explicit-event Temporal Markov Decision Problems. This algorithmic contribution brings together results from simulation theory, forward search in MDPs, and statistical learning theory. The implicit-event approach was tested on a
specific version of the Mars rover planning problem and on a drone patrol mission planning problem while the explicit-event approach was evaluated on a subway network control problem
TiMDPpoly: An Improved Method for Solving Time-Dependent MDPs
We introduce TMDPpoly, an algorithm designed to solve planning problems with durative actions, under probabilistic uncertainty, in a non-stationary, continuous-time context. Mission planning for autonomous agents such as planetary rovers or unmanned aircrafts often correspond to such time-dependent planning problems. Modeling these problems can be cast through the framework of Time-dependent Markov Decision Processes (TiMDPs). We analyze the TiMDP optimality equations in order to exploit their properties. Then, we focus on the class of piecewise polynomial models in order to approximate TiMDPs, and introduce several algorithmic contributions which lead to the TMDPpoly algorithm for TiMDPs. Finally, our approach is evaluated on an unmanned aircraft mission planning problem and on an adapted version of the well-known Mars rover domain
Approximate Policy Iteration for Generalized Semi-Markov Decision Processes: an Improved Algorithm
In the context of time-dependent problems of planning under uncertainty, most of the problem's complexity comes from the concurrent interaction of simultaneous processes. Generalized Semi-Markov Decision Processes represent an efficient formalism to capture both concurrency of events and actions and uncertainty. We introduce GSMDP with observable time and hybrid state space and present an new algorithm based on Approximate Policy Iteration to generate efficient policies. This algorithm relies on simulation-based exploration and makes use of SVM regression. We experimentally illustrate the strengths and weaknesses of this algorithm and propose an improved version based on the weaknesses highlighted by the experiments
On the Locality of Action Domination in Sequential Decision Making
In the field of sequential decision making and reinforcement learning, it has been observed that good policies for most problems exhibit a significant amount of structure. In practice, this implies that when a learning agent discovers an action is better than any other in a given state, this action actually happens to also dominate in a certain neighbourhood around that state. This paper presents new results proving that this notion of locality in action domination can be linked to the smoothness of the environment's underlying stochastic model. Namely, we link the Lipschitz continuity of a Markov Decision Process to the Lispchitz continuity of its policies' value functions and introduce the key concept of influence radius to describe the neighbourhood of states where the dominating action is guaranteed to be constant. These ideas are directly exploited into the proposed Localized Policy Iteration (LPI) algorithm, which is an active learning version of Rollout-based Policy Iteration. Preliminary results on the Inverted Pendulum domain demonstrate the viability and the potential of the proposed approach
- …